A VIDEO CODING APPARATUS 
ACCORDING TO A FEATURE OF A VIDEO PICTURE 

BACKGROUND OF THE INVENTION 
5 Field of the Invention 

The present invention relates to a video coding 
apparatus and , more particularly , to a video coding apparatus 
for performing coding by using motion compensatory 
prediction of a digital video signal. 

10 Description of the Related Art 

Among highly efficient coding systems for coding 
sequentially input video signals in a fewer code quantity, 
coding systems by the use of the motion and correlation 
between video pictures of video signals include motion 

15 compensatory prediction coding which decodes to reproduce 
a video picture coded in the past and uses motion information 
per small block derived from the video picture. One example 
of the conventional motion compensatory prediction coding 
is illustrated in Fig. 1. 

20 In Fig. 1 , when an input video signal 1 of a first screen 

is input, each of switches is controlled to be connected 
onto a side ( 1 ) by a coding mode control section 12, and 
the input video signal 1 is input directly into an orthogonal 
transform unit 3 in order to achieve high coding efficiency. 

25 The input video signal 1 is orthogonally transformed by using 
DCT (discrete cosine transform) or the like in the orthogonal 
transform unit 3. An orthogonal transform coefficient is 
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quantized by a quantizer 4. The resultant quantization 
coefficient is converted into a variable length code such 
as a Huffman code by a first variable length encoder 5, and 
then, is input into a video duplexer 15. 
5 In the meantime, the quantization coefficient input 

into an inverse quantizer 6 is inversely quantized , and then , 
video picture data is restored by an inverse orthogonal 
transform unit 7 . The restored video picture data is stored 
in a frame memory 9. Moreover, coded data 13 transmitted 
S 10 from the first variable length encoder 5 and quantization 

01 information 18 transmitted from the quantizer 4 is duplexed 

U1 by the video duplexer 15, to be output as a coded video data 

i>n output 16 . 

:i,3l 

When another input video signal 1 of a next screen is 
j'i"J 15 input, each of the switches is controlled to be connected 

to a contact on a side ( 2 ) by the coding mode control section 
12 , so that the input video signal 1 is input into a predictive 
signal subtraction section 2 and a motion compensator 10. 
In the motion compensator 10, a motion vector is detected 
20 based on the input video s ignal 1 and a reference video picture 
input from the frame memory 9, and then, is input into a 
position shifter 11 and a second variable length encoder 
14 . In the second variable length encoder 14 , motion vector 
information is converted into a variable length code such 
25 as a Huffman code, thus to be input into the video duplexer 
15. 

In the position shifter 11, a video signal designated 
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by the motion vector is extracted from the frame memory 9, 
and thereafter , is output as a motion compensatory predictive 
signal to the predictive signal subtraction section 2 and 
a local decoding addition section 8. In the predictive 
5 signal subtraction section 2, the motion compensatory 
predictive signal is subtracted from the input video signal 
1 , so that a prediction error thereof is coded . A prediction 
error signal is orthogonally transformed by using DCT 
(discrete cosine transform) or the like in the orthogonal 

10 transform unit 3 in order to achieve high coding efficiency. 
The signal quantized by the quantizer 4 is converted into 
a variable length code such as a Huffman code in the first 
variable length encoder 5. In order to use the same 
predictive signal as that on a decoding side, the 

15 quantization coefficient obtained by the quantizer 4 is 
inversely quantized by the inverse quantizer 6, and then, 
the prediction error signal is locally decoded by the 
inversely orthogonal transform unit 7. Furthermore, the 
motion compensatory predictive signal is added with the 

20 prediction error signal decoded by the local decoding 
addition section 8, and then, is stored in the frame memory 
9. 

In view of convenience of highly efficient coding and 
decoding reproduction, the video picture is coded by 
25 combining three kinds of video coding systems for P, B and 
I frames • 

A minimum unit of video pictures, which are formed by 
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combining the three kinds of video coding systems and can 
be decoded independently of each other, is referred to as 
M a GOP (a Group of Pictures ) " . The combination of the coding 
systems is referred to as "a GOP structure". A frame first 
5 coded inside one GOP is intra- frame coding (an I frame). 
Fig . 2 illustrates an example of a GOP . In Fig . 2 , the number 
of frames included in one GOP is referred to as a GOP size, 
and an interval between P frames or between an I frame and 
a P frame is referred to as a predictive frame interval. 

An I frame inserting interval has been conventionally 
constant\irrespectively of the feature of the input video 
picture: namely, the GOP size has been fixed, so that 
intra-frame coding has been forcibly carried out per 
predetermined nunus<er of frames. Consequently, the I frame 

15 has been inserted e^en in the case where the input video 
picture has the high correlation with the reference video 
picture and coding efficiency can be enhanced by using 
inter- frame prediction coding. 

As for the predictive frame interval , a predictive frame 

20 interval of highest coding efficiency depends on the feature 
of the video picture. For example, a video picture of a 
swift motion can be predicted from the reference video 
picture with high efficiency by shortening the predictive 
frame interval, thus enhancing the coding efficiency. To 

25 the contrary, in the case of little variation, the predictive 
frame interval is prolonged, thereby enhancing the coding 
efficiency. However, since the predictive frame interval 
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is fixed to about 0 . 1 second irrespectively of the feature 
of the video picture in the conventional system, the coding 
efficiency can not be enhanced. 
^o g>fc^) furthermore, in a video picture compression system 
5 capableNof coding by either a frame structure or a filed 
structured there can be used either coding by "the field 
structure" mi which one video picture to be coded is coded 
in a manner corresponding to one field video picture or coding 
by "the frame structure" in which one video picture to be 

10 coded is coded inV manner corresponding to one interlaced 
frame video picturte. However, in the prior art, it is 
previously designated from the outside as to which is 
selected out of the frame structure and the field structure 
before the video picture \s coded, so that the video picture 

15 to be input is coded by fixedYy using the designated structure, 
thereby outputting coded dVta. That is, the coding is 
carried out by the fixed picture structure irrespectively 
of the feature of the video pifcture. 

Therefore, even in the case of coding a video picture 

20 of a swift motion in which the coding efficiency can be 
enhanced by adopting the field structure, the coding by the 
frame structure is continued if the frame structure is 
previously designated as the coding picture structure, 
resulting in degradation of the coding efficiency. To the 

25 contrary, in the case where the coding by the field structure 
is previously designated, the coding efficiency cannot be 
enhanced since the field structure is fixedly used even if 
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the coding efficiency can be enhanced by the frame structure • 



Additionally , in the case where it is not found whether 
the inpVt video picture is an interlaced video picture or 
a non-interlaced video picture, the high coding efficiency 
5 can be achieved by a 2-step system in which it is previously 
discriminated by some method whether or not the input video 
picture is an interlaced video picture, and thereafter, the 
picture structure is switched from the outside at the time 
of coding based on the discrimination information. Such 
10 a 2-step system isVunavailable on the assumption of coding 
at real time. \ 



15 attempt to solve the above problems experienced by the prior 
art. Therefore, an object of the present invention is to 
provide a video coding apparatus in which coding efficiency 
can be enhanced and a quality of a coded video picture can 
be stabilized by adaptively changing a GOP size and a 

20 predictive frame interval according to the feature of an 
input video picture or variations of the feature of the input 
video picture. 

Another object of the present invention is to provide 
a video coding apparatus in which coding efficiency can be 

25 enhanced and a quality of a coded video picture can be 
stabilized by automatically discriminating whether an input 
video picture having no information on the feature or 




SUMMARY OF THE INVENTION 



The present invention has been accomplished in an 
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structure of a video picture is an interlaced input video 
picture or a non-interlaced input video picture and analyzing 
the feature of the video picture to be input , so as to 
adaptively change a picture structure in video picture 
5 compressing/coding to a frame structure or a field structure . 
^O fr A^B>) Vn order to achieve the above objects, the present 
invention has a first characteristic in means for detecting 
a variance between the video pictures based on information 
on sequentially input video pictures, determining the 

10 correlation\between the video pictures based on the detected 
inf ormationAand deciding the video picture for which an 
intra-frame coding system is used according to the degree 
of the correlation. 

With this characteristic, a GOP size depends on the 

15 feature of the video picture. 

furthermore, the present invention has a second 
characteristic in means for detecting a motion feature 
between the^ input video pictures so as to decide an optimum 
predictive f^ame interval. 

20 With this characteristic , the optimum predictive frame 

interval can be decided based on the motion feature between 
the input video pictures . 
S QBfc^) Moreover, the present invention has a third 
characteristic in means for discriminating whether each of 

25 sequentiallyNinput video pictures is an interlaced video 
picture or a noi\- interlaced video picture, wherein coding 
by the field structure is selected if the video picture 
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is ar\ interlaced video picture while coding by the frame 
structure is selected unless the video picture is an 
interlaced video picture. 

^u(b^E^) Additionally , the present invention has a fourth 

5 characteristic in calculating a variance of a video picture 
based on an\interlaced video picture to be input so as to 
switch codingSby the frame/field structures based on the 
calculation vali 
so8f\?) ft ith these third and fourth characteristics, it is 

10 possibleyto prevent any degradation of the coding efficiency 
caused by a variation in feature of the input video picture , 
which was inevitable at the time of fixed selection of the 
frame/filed structures in the prior art. Furthermore , 
since the discrimination as to whether the input video 

15 picture is an interlace&^yideo picture or a non-interlaced 
video picture, which needNbe found before the coding, is 
automatically detected at tl^e time of the coding, the 
efficient coding can be carriedx^ut irrespectively of the 
feature or structure of the input\yideo picture. 

20 

BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. 1 is a block diagram illustrating a conventional 
motion compensatory prediction coding apparatus, to which 
the present invention is applied. 
25 Fig. 2 is a view illustrating a conventional GOP 

structure. 

Fig. 3 is a block diagram illustrating a motion 
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compensatory prediction coding apparatus encompassing the 
present invention . 

Fig. 4 is a block diagram illustrating the configuration 
of a GOP structure deciding section in a first preferred 
5 embodiment according to the present invention. 

Figs. 5A and 5B are views explanatory of a method for 
calculating a variance between two pixels. 

Fig. 6 is a view explanatory of creation of a downscaled 
video picture for the purpose of simple motion estimation. 
10 Fig. 7 is a block diagram illustrating a second 

preferred embodiment according to the present invention. 

Fig . 8 is a block diagram illustrating a third preferred 
embodiment according to the present invention. 

Fig. 9 is a block diagram illustrating a fourth 
15 preferred embodiment according to the present invention. 

Fig. 10 is a graph illustrating simulation results 
according to the present invention. 

Fig. 11 is a block diagram illustrating the 
configuration in a fifth preferred embodiment according to 
20 the present invention. 

Fig. 12 is a view illustrating the configuration of 
a frame video picture. 

Fig. 13 is a view explanatory of a pixel for calculating 
an absolute difference. 
25 Fig. 14 is a view explanatory of creation of a downscaled 

feature plane. 

Fig. 15 is a view explanatory of processing of the simple 
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motion estimation. 

Fig. 16 is a block diagram illustrating the 
configuration in a sixth preferred embodiment according to 
the present invention. 
5 Fig. 17 is a block diagram illustrating the 

configuration in a seventh preferred embodiment according 
to the present invention. 

Fig. 18 is a block diagram illustrating the 
configuration in an eighth preferred embodiment according 
10 to the present invention. 

Fig. 19 a block diagram illustrating the configuration 
in a ninth preferred embodiment according to the present 
invention. 



15 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The present invention will be described in detail below 
in reference to the drawings. Fig. 3 is a block diagram 
illustrating the configuration in a first preferred 
embodiment according to the present invention. Although 

20 the coding apparatus illustrated in Fig. 1 is used as a video 
picture coding system in the description below, the present 
invention is not limited to such a coding apparatus. The 
same reference numerals as those used in Fig. 1 denote like 
or corresponding constituent elements. 

25 This preferred embodiment is characterized in that the 

features of sequentially input video signals are analyzed 
based on the video s ignals , so that a GOP structure is decided 
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according to the features , thereby performing coding 
processing based on the GOP structure. 

In Fig. 3, first, the features of sequentially input 
video signals are analyzed in a GOP structure decision 
5 section 20, which then decides a GOP structure according 
to the input video picture based on the features. 
Subsequently, when the video picture is coded, a GOP 
structure information signal 21 is output to a coding mode 
control section 12; in the meanwhile, coding complexity 

10 prediction information 22 is output to a coding bit rate 
control section 17. Operation other than the 

above-described operation is similar to that of the coding 
apparatus illustrated in Fig. 1, and so, its description 
will be omitted. 

15 Figs. 4, 7, 8 and 9 are block diagrams illustrating 

preferred processings of the GOP structure decision section 
20 illustrated in Fig. 3. First of all, the processing will 
be explained in reference to Fig. 4 illustrating the first 
preferred embodiment according to the present invention. 

20 First, a frame memory 31 stores therein the sequentially 
input video signals. The frame memory 31 can store therein 
video pictures equivalent or more to the maximum GOP size. 

An inter-frame variance analysis section 32 calculates 
a variance of a target video picture based on the video 

25 pictures stored in the frame memory 31 and a timewise 
immediately preceding video picture adjacent to the target 
video picture, and then, outputs inter-frame variance 
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information A resulting from the calculation to a GOP 
boundary position decision section 33. Here, although the 
target video picture and the immediately preceding video 
picture are used for the calculation of the inter-frame 
5 variance information A, video pictures other than the 
immediately preceding video picture may be used. 

The GOP boundary position decision section 33 decides 
a position optimum for a GOP boundary inside the frame memory 
31 based on the inter-frame variance information A output 

10 from the inter-frame variance analysis section 32 , and then, 
outputs the decided position as GOP boundary position 
information B. Upon this decision of the GOP boundary 
position, the video pictures prior to the decided GOP 
boundary position stored inside the frame memory 31 

15 constitute one GOP. 

A simple motion estimation section 34 decides a 
reference video picture out of the video pictures equivalent 
to one GOP size stored in the frame memory 31 after an I 
frame inserting position, i.e., after the decision of the 

20 size of one GOP by the GOP boundary position decision section 
33, and then, outputs motion feature prediction information 
C by simple motion estimation between the reference video 
picture and the other video picture. 

Subsequently, a predictive frame interval decision 

25 section 35 decides a predictive frame interval based on the 
motion feature prediction information C input from the simple 
motion estimation section 34, and then, outputs predictive 
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frame interval information D. 

The inter-frame variance information A, the GOP 
boundary position information B, the motion feature 
prediction information C and the predictive frame interval 
5 information D are input into a coding complexity prediction 
section 37, which predicts coding complexity at each of I, 
P and B frame coding modes so as to output the resultant 
information as coding complexity prediction information E 
to a coding bit rate control section 17. 

10 The coding bit rate control section 17 controls a coding 

bit rate in coding the input video picture in consideration 
of the coding complexity prediction information E input from 
the coding complexity prediction section 37. The GOP 
boundary position information B and the predictive frame 

15 interval information D are output also to a coding mode 
control section 12, which controls switches at the time of 
the coding in the GOP structure decided on the basis of the 
information B and D. 

After the decision of the structure of one GOP inside 

20 the frame memory 31, the frame memory 31 outputs a video 
signal to the predictive signal subtraction section 2 shown 
in Fig. 3 in order to code each video picture of the GOP. 
Information on the output video signal is erased from the 
frame memory 31. 

25 Upon completion of the coding of one GOP, the frame 

memory 31 stores therein video pictures input in sequence 
posterior to the residual video pictures stored therein. 
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When the frame memory 31 stores therein the video signals 
equivalent to the GOP size to the maximum, it performs the 
processing of deciding a next GOP structure. This 
processing is repeated. 
5 ^°Sj^)toext, description will be given in detail of the 
operation of each of the constituent elements illustrated 
in Fig. \. 

First, the frame memory 31 stores therein the 
sequentially input video signals. The number of video 

10 pictures to be stored is equivalent or more to the maximum 
GOP size which is decided at the time of the coding. The 
frame memory 31 outputs the video signals to the inter-frame 
variance analysis section 32 and the simplemotion estimation 
section 34, respectively. When the structure of one GOP 

15 is decided in the stored video pictures, the video signal 
is output to the video coding apparatus. Consequently, the 
output video signal is erased from the frame memory 31, and 
then, a newly input video signal is stored in that vacant 
region in the frame memory 31. 

20 Subsequently, the inter-frame variance analysis 

section 3 2 fetches two pieces of video picture information 
from the frame memory 31, to calculate the inter-frame 
variance information A. The calculating methods include 
a method for calculating a variance based on the intra-f rame 

25 sum of absolute differences of pixel information on the two 
video pictures at the same pos it ion ; and a method for dividing 
the video picture into small blocks, determining dispersion 
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values of pixels in the small blocks, and calculating the 
intra-frame sum of absolute differences between frames in 
which the dispersion values are representative of the small 
blocks. 

5 in the former deciding method, as shown in , for example , 

Fig. 5A, assuming that pixel values of the video pictures 

(i) and (j) are designated by Pil, Pi2, , Pin and Pjl, 

Pj2, Pjn, respectively, the intra-frame sum A of the 

absolute differences is expressed by the following equation 
10 (1): 

n 

A = 2 | Pik - Pjk | ... (1) 

k m 1 

Furthermore, in the latter deciding method, as shown 

in, for example, Fig. 5B, assuming that dispersion values 

of the small blocks in the video pictures (i) and (j) are 

15 designated by cril, cri2, aim and ajl, a*j2, cr 

jm, respectively, the intra-frame sum A of the absolute 

differences is expressed by the following equation (2): 
m 

A = 2 | crik - a jk | ... (2) 

Although each of the pixel values in the decision 
20 methods is processed by using only luminance, it may be 
processed by using chrominance or using both luminance and 
chrominance. The inter-frame variance information A 
calculated by the inter-frame variance analysis section 3 2 
is output to the GOP boundary position decision section 3 3 
25 and the coding complexity prediction section 37. 

The GOP boundary position decision section 33 decides 
a video picture immediately before the frame as the GOP 
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boundary based on the inter-frame variance information A 
input from the inter-frame variance analysis section 32 in 
the case where the value of the information A exceeds a 
predetermined threshold value . Otherwise , the GOP boundary 
5 position decision section 33 may decide a video picture 
immediately before a video picture having a maximum value 
of the information A as the GOP boundary based on the 
inter-frame variance information A on all of the video 
pictures stored inside the frame memory 31; or it may decide 

10 it based on a logical sum or a logical product obtained by 
both the system using the threshold value and the system 
using the maximum value. The GOP boundary position 
information B obtained in the GOP boundary position decision 
section 3 3 is output to the simple motion estimation section 

15 34, the coding mode control section 12 and the coding 
complexity prediction section 37. 

After one GOP size with respect to the video pictures 
inside the frame memory 31 is decided in the GOP boundary 
position decision section 33, the simple motion estimation 

20 section 34 performs simple motion estimation processing in 
order to predict motion information on the video picture 
inside the GOP. In a method for collecting most accurate 
motion information, the input video picture is divided into 
small blocks each composed of 8x8 pixels or 16x16 pixels, 

25 each of the small blocks is subjected to motion estimation, 
and consequently, the most accurate motion information is 
determined based on the resultant motion information on each 
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of the small blocks in the same manner as the motion estimation 
processing by the motion compensator 10 in the video coding 
apparatus for the video pictures illustrated in Fig. 3. 
However, since a processing quantity required for the motion 
estimation is huge, addition processing of 2 31 times or more 
is required for the motion estimation processing of one video 
picture in the case where, for example, the video picture 
size is 720 x 480 pixels and the motion estimation falls 
within the range of ±16 pixels. As a consequence, the 
present invention uses the means for predicting the motion 
information on the video picture based on the information 
resulting from the simple motion estimation processing of 
a small processing quantity performed in the simple motion 
estimation section 34. Description will be given below of 
the simple motion estimation processing. 

First, one video picture decided inside the GOP is 
selected as a reference video picture. Thereafter, the 
reference video picture is divided into small blocks . 
Subsequently, a downscaled video picture, in which the small 
block is expressed by one representative value, is created. 
Here, the dispersion of all of the pixel values inside the 
small block, for example, can be used for calculation of 
the representative value. The oldest video picture out of 
the target GOP is selected as the reference video picture, 
but other video pictures may be selected. 

Next, in order to grasp the motion features in 
comparison with the reference video picture , the target video 




pictures are determined, and then, the downscaled video 
pictures of these video pictures are created. Thereafter, 
the motion estimation processing is performed by the use 
of the downscaled video pictures of both of the reference 
5 video picture and the target video picture. Although 
according to the present invention, the motion estimation 
processing is performed with respect to all of the video 
pictures except the reference video picture inside the GOP, 
not all of the video pictures but some selected video pictures 

10 may be subjected to the motion estimation processing. 

Fig . 6 illustrates a method for creating the downscaled 
video picture. Assuming that a video picture to be input 
is composed of M pixels in a horizontal direction multiplied 
by N pixels in a vertical direction and a small block is 

15 composed of 8 pixels in the horizontal and vertical 
directions, respectively, a representative value of the 
small block is one with respect to 64 pixels, so that a 
downscaled video picture to be created is composed of M/8 
in the horizontal direction multiplied by N/8 in the vertical 

20 direction in the case where N and M each are a multiple of 
8 . Furthermore, the size of the small block may be processed 
not in the size of 8x8 pixels , but in the size of 16x16 pixels 
or in the sizes of all other rectangular blocks. 

Although the dispersion value of each of the pixel 

25 values inside the small block is used for the calculation 
of the representative value per small block in this system, 
an average value, a standard deviation, an absolute error 
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sum with respect to the average value or combinations thereof 
may be used. Although a luminance is used herein as the 
pixel value, a luminance and/or a chrominance may be used. 
The motion estimation is generally performed per small 
5 block, so that a vector indicating a position of a smallest 
difference is calculated, thereby obtaining the motion 
feature. However, accuracy of the motion vector 
information is low since the downscaled video picture is 
used in this system. Consequently, the smallest motion 

10 compensatory prediction error at the time of the motion 
estimation based on the downscaled video picture information 
is a motion feature value of the video picture, i.e., the 
motion feature prediction information C as an index of the 
magnitude of the mot ion of the entire video picture . A square 

15 error, an absolute error, an absolute error at a square root 
may be used for the calculation of the motion compensatory 
prediction error. 

The obtained motion compensatory prediction error as 
the motion feature prediction information C is input into 

20 the predictive frame interval decision section 35, which 
decides a predictive frame interval based on the motion 
feature prediction information C. The predictive frame 
interval is small in the case where a motion or a variation 
between the video pictures is great in coding the video 

25 pictures; to the contrary, the predictive frame interval 
is large in the case where a motion or a variation between 
the video pictures is small in coding the video pictures, 
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thereby achieving most efficient coding. Consequently / in 
order to grasp the motion feature over one GOP, the motion 
feature prediction information C on all of the reference 
video picture and other video pictures inside one GOP is 
5 obtained, and then, the average value thereof is obtained. 
The average value is used as a representative value, on the 
basis of which the predictive frame interval is determined. 
One of the characteristics of the present invention resides 
in that the inversely proportional relationship is 
10 established between the predictive frame interval and the 
obtained average value. Besides the method for using the 
average value, a maximum value or a minimum value may be 
used for the calculation of the representative value inside 
one GOP. 

15 Since a relative motion quantity with respect to a pixel 

becomes large in the case where the resolution of the video 
picture to be input is high, the inversely proportional 
relationship is established between the resolution of the 
video picture and an optimum predictive frame interval. 

20 Another characteristic of the present invention resides in 
that the inversely proportional relationship with respect 
to the resolution information on the video picture is 
considered in deciding the predictive frame interval. The 
decided predictive frame interval information D is output 

25 to the coding complexity prediction section 3 7 and the coding 
mode control section 12. The GOP boundary position 
information B and the predictive frame interval information 
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D are transmitted together to the coding mode control section 
12, in which the switches are controlled based on the 
information B and D in coding the video pictures . 

The inter-frame variance information A, the GOP 
5 boundary position information B, the motion feature 
prediction information C and the predictive frame interval 
information D are input into the coding complexity prediction 
section 37 , which calculates coding complexity prediction 
information E as an index of generated code quantity 
10 prediction in coding at a coding mode of each of the I, P 
and B frames, and then, outputs the coding complexity 
prediction information E to the coding bit rate control 
section 17. 

When the processing proceeds to coding of a new GOP, 
15 the coding bit rate control section 17 renews coding 
complexity prediction information at each of the codingmodes 
based on the coding complexity prediction information E input 
from the coding complexity prediction section 37. Coding 
complexity prediction information used at past frames having 
20 the same coding mode has been conventionally used 
irrespectively of switching of the input video picture or 
fluctuations. Consequently, in the case where an input 
video picture has suffered from a large change such as a 
change in scene, the video picture has been influenced by 
25 coding complexity prediction information on a frame having 
no correlation, with an attendant problem of markedly 
degradation of a quality of the video picture. However, 
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since the prediction is carried out based on the information 
of the video picture to be coded according to the present 
invention, the above-described problem can be solved. 

Subsequently , explanation will be made on a method for 
5 calculating the coding complexity prediction information 
E at each of the coding modes. A video picture which is 
coded as an I frame is divided into small blocks and the 
dispersion of a pixel value per small block is decided, so 
that the coding complexity prediction information E at the 
j;;5 10 I frame is calculated by a product of an intra- frame average 

ij| of the dispersions multiplied by a fixed value SI as a scaling 

^ parameter. Luminance information and/or chrominance 

VI information may be used as the pixel value. 

I;M In the case where an absolute difference calculated 

lsJ 15 between the dispersion of the pixel value per small block 

ili and dispersion of a pixel value of an adjacent small block 

i H 
*;;!? 

ill exceeds a threshold value, it is judged that the small block 

region of the input video picture includes edge information 
such as an outline, so that the coding bit rate control section 

20 17 takes the judgement into consideration so as to assign 
many coding quantities in coding the small block region. 

An average of the motion compensatory prediction errors 
is obtained based on all of the motion feature prediction 
information C inside the target GOP, and then, the coding 

25 complexity prediction information E at the P frame is 
calculated by a product of the average value multiplied by 
a fixed value SP as a scaling parameter. Otherwise, the 
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coding complexity prediction information E at the I frame 
may be scaled for the calculation. 

The coding complexity prediction information E at the 
B frame is calculated by a product of the coding complexity 
5 prediction information at the P frame multiplied by a fixed 
value SB as a scaling parameter. 

Subsequently, a second embodiment according to the 
present invention is illustrated in Fig. 7. The present 
embodiment is configured such that the processing of the 
r;| 10 coding complexity prediction section 3 7 in the first 

Iff embodiment illustrated in Fig. 4 is omitted. 
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Next, a third embodiment according to the present 
invention is illustrated in Fig. 8. The present embodiment 
is configured such that the processing concerned in the 
15 decision of the GOP size in the second embodiment illustrated 
J in Fig. 7 is omitted, in the present embodiment, a GOP size 

is fixed with a length designated in advance. In each GOP, 
an optimum predictive frame interval is adaptively varied 
based on motion feature prediction information C. 
20 Subsequently, a fourth embodiment according to the 

present invention is illustrated in Fig. 9. The present 
embodiment is configured such that the processing concerned 
in the decision of the predictive frame interval in the second 
embodiment illustrated in Fig. 7 is omitted. In the present 
25 embodiment, the predictive frame interval is fixedly 
designated in advance. Only a GOP size is adaptively varied 
based on inter- frame variance information A which is a 
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feature of an input video picture. 

As is obvious from the above description, since 
according to the present invention the GOP size is decided 
according to the feature or variation of the input video 
5 picture, the GOP size can be decided in a manner adaptive 
to the variation of the input video picture. Therefore, 
it is possible to avoid degradation of the coding efficiency 
or fluctuation of a quality of the video picture which may 
occur in the case of coding with the fixed GOP size. 
10 Moreover, since the motion feature of the video picture 

inside the GOP can be detected based on the decided GOP size 
and the predictive frame interval according to the motion 
feature can be set, the predictive frame interval can be 
taken according to the motion feature of the input video 
3 15 picture. Consequently, it is possible to enhance the coding 

I efficiency more than the case of the conventional coding 

at the fixed predictive frame interval. 

Additionally, the coding complexity prediction 
information used in the preceding GOP has been considered 
20 even in the case where there has been no correlation in video 
picture feature between a preceding GOP and a target GOP 
due to a scene change or the like in the prior art, thereby 
inducing markedly fluctuations or deterioration of a quality 
of the video picture to be coded or degradation of the coding 
25 efficiency. In contrast, according to the present 
invention, the coding complexity prediction information is 
calculated based on the features of the video pictures inside 
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the target GOP after the completion of the coding of one 
GOP and before coding of a next GOP, so that a video picture 
can be coded with a stable quality without any influence 
of the feature of the video picture inside an irrelevant 



Fig. 10 shows the simulation result on a video picture 
with a change in scene in the MPEG2 system. In this 
simulation, under the condition where compression coding 
was carried out at a coding rate of 4 Mbit/s, fluctuation 

10 of PSNR was small and the quality of the video picture could 
be improved by 0.65 dB according to the present invention 
in comparison with the coding in the prior art in which the 
GOP size was fixed to 15 frames and the predictive frame 
interval was fixed to 3 frames. 

15 Next, a fifth embodiment according to the present 

invention will be explained in reference to Fig. 11. In 
the present embodiment, it is discriminated based on each 
of sequentially input video signals (stationary video 
signals) whether or not an input video picture is an 

20 interlaced video picture. If the input video picture is 
an interlaced video picture, a downscaled feature plane is 
created, and then, coding in a frame/field structure is 
decided based on the result of simple motion estimation 
processing by the use of the downscaled feature plane. 

25 In Fig. 11, an interlaced/non-interlaced video 

discriminant section 51 discriminates whether each of 
sequentially input video signals 1 is an interlaced video 
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GOP. 
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signal or a non-interlaced video signal. The 
discrimination result is output as 

interlaced/non-interlaced discriminant information 52 to 
a downscaled feature plane creation section 53. The 
5 downscaled feature plane creation section 53 creates 
downscaled feature plane information 54 in consideration 
of the feature of the video picture with respect to the video 
picture which is discriminated as the interlaced video 
picture in the interlaced/non-interlaced video 

10 discriminant section 51 , and outputs the downscaled feature 
plane information 54 to a simple motion estimation section 
55. The simple motion estimation section 55 performs simple 
motion estimation processing between two downscaled feature 
planes, and outputs the resultant motion compensatory 

15 prediction error as image variance information 56 to a 
frame/field structure decision section 57. 

Based on the image variance information 5 6 obtained 
by the simple motion estimation section 55, the frame/field 
structure decision section 57 decides coding by the frame 

20 structure in the case of a small variance while coding by 
the field structure in the case of a large variance, and 
outputs the result as picture structure information 58 to 
a video coding section 59. The video coding section 59 
performs video coding with respect to the input video signal 

25 1 in response to picture structure information 58 indicated 
by the frame/field structure decision section 57 , and outputs 
coded data 16. Here, the video coding section 59 switches 
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the operations of, for example, the motion compensator 10, 
the first variable length encoder 5 and the second variable 
length encoder 14, illustrated in Fig. 1, to the operation 
adaptive to coding in the frame/ field structure according 
5 to the designation of the frame/field structure based on 
the picture structure information 58. 

Next, description will be given of one example of the 
configuration and operation of each of the constituent 
elements in Fig. 11. First, explanation will be made on 

10 the interlaced/non-interlaced video discriminant section 
5 1 . The discrimination as to whether or not the video picture 
is an interlaced video picture is decided by the calculation 
with some adjacent pixels based on the video picture 
information to be input. Fig. 12 illustrates the 

15 configuration of frame video information to be input. The 
video information is composed of the array of spacewise 
uniformly arranged pixels . Based on the video information, 
five pixel values continuous in a vertical direction at an 
arbitrary position are taken, and then, an absolute 

20 difference between two pixels is calculated, as illustrated 
in Fig. 13. 

There are calculated absolute differences between 
pixels belonging to the same fields of 0 and -2, 0 and 2, 
and -1 and 1 in five pixels p(-2) to p(2), wherein a pixel 
25 positioned at the center in the vertical direction is 
designated by p( 0 ) , and absolute differences between pixels 
belonging to different fields of 0 and -1, and 0 and 1. It 
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is verified whether or not the condition expressed by 

inequality (3) below is satisfied: 

Max (d( 0,-2) ,d(0,2) ,d(-l, 1 ) ) < threshold value ... (3 ) 
Subsequently, if the condition expressed by inequality 

(3) above is satisfied, it is further verified whether or 

not the condition expressed by inequality (4) below is 

satisfied: 

(Max(d(0,-2) ,d(0,2),d(-l,l) )+offset) < 

Min(d(0,-1 ) ,d( 0, 1 ) ) (4) 

Here, d(a,b) represents an absolute difference between 
a and b; Max(a,b,c), a maximum value of a, b and c; and 
Min(a,b,c), a minimum value of a, b and c. That is, in the 
case where the pixel values belonging to the same field are 
similar to each other and the maximum absolute difference 
is less than the threshold value (a fixed value), it is 
verified whether or not the minimum value of the absolute 
differences at the different fields exceeds a value obtained 
by adding an offset (a fixed value) to the maximum absolute 
difference at the same field. This processing is performed 
with respect to all of the pixels or the arbitrary number 
of positions inside the video picture. In the case where 
the points satisfying inequalities (3) and (4) exceed a 
predetermined rate of the points satisfying inequality ( 3 ) , 
the video picture is discriminated as an interlaced video 
picture, and then, the result is output as the 
interlaced/non-interlaced discriminant information 52 per 
frame to the downscaled feature plane creation section 53. 
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Furthermore, although the description has been given 
of the example in which the discrimination of the 
interlaced/non-interlaced video picture is performed by the 
use of the five pixels in the vertical direction, the number 
5 of pixels required for the verification is arbitrary as long 
as it is three or more wherein comparison can be conducted 
between adjacent pixels at the same field and pixels at 
different fields. Moreover, the position of the pixel to 
be verified may be any of all of the pixels inside the video 

10 picture. Otherwise, a sample point may be investigated such 
that the above-described verification is conducted at a 
specific position or an arbitrary position inside one block 
composed of, for example, five pixels in the vertical 
direction and n pixels in the horizontal direction. 

15 Alternatively, utterly arbitrary points may be spot-checked 
at random. 

Subsequently, explanation will be made on the 
processing of the downscaled feature plane creation section 
53 illustrated in Fig. 11, i.e., the processing of creating 

20 a downscaled plane in consideration of the feature of the 
video picture based on an original video picture in reference 
to Fig. 14. First, the original video picture is divided 
into small blocks, each of which is expressed by a 
representative value. According to the present invention, 

25 the standard deviation of the pixel values per small block 
is used as the representative value. An average value or 
a median value may be used as the representative value. A 
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luminance component of the pixel may be used as the pixel 
value at the time of the calculation; or, other components 
or an average thereof may be used. Furthermore, the size 
of the small block may be arbitrarily set. Assuming that 
5 the small block is composed of ph pixels in the horizontal 
direction multiplied by pv pixels in the vertical direction , 
the downscaled feature plane is composed of H/ph pixels in 
the horizontal direction multiplied by V/pv pixels in the 
vertical direction with respect to the size of the original 

10 video picture (H pixels in the horizontal direction 
multiplied by V pixels in the vertical direction) , so that 
the number of samples becomes l/(ph x pv) with respect to 
the number of pixels of the original video picture. The 
downscaled plane having the standard deviation of the small 

15 block as the representative value is the downscaled feature 
plane information 54 . 

Next, description will be given below of the processing 
by the simple motion estimation section 55 illustrated in 
Fig. 11. The simple motion estimation section 55 performs 

20 the motion estimation processing between the two downscaled 
feature planes based on the downscaled feature plane 
information 54 created by the downscaled feature plane 
creation section 53. A timewise distance between a 
reference plane and a target plane to be subjected to the 

25 simple motion estimation is an arbitrarily fixed value. 
Motion estimation by block matching or the like can be used 
in the motion estimating method. In this case, the block 
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can take an arbitrary natural number for both of horizontal 
and vertical sizes on the downscaled feature plane. 
Consequently , the motion estimation per block can be carried 
out by using, as one block, one sample at the minimum or 
5 the entirety of one downscaled feature plane at the maximum. 

Referring to Fig. 15, explanation will be made below. 
An upper left coordinate of the set block is designated by 
(k,l); an element on a downscaled feature plane 1, c(k,l); 
and an element on a downscaled feature plane 2, r(k,l). 

10 Reference character N represents the size of the block in 
the horizontal direction; and M, the size of the block in 
the vertical direction. The estimation range falls within 
±sh in the horizontal direction and ±sv in the vertical 
direction . An average motion compensatory prediction error 

15 E(k,l) of one element in this simple motion estimation is 
determined based on the minimum error within the estimation 
range according to the following equations (5) and (6): 

E(k,l) =Mi n (Err (k, l.h.v) ) ( 5 ) 
here, 

on M ~ 1 N " 1 / c \ 

Err (k. l.h.v) = J J| c (k4m l+n) -r ( (k4mfh) . (l+n+v) ) I • • • < 6 > 

m=0 n=0 

(-sh^h<sh, -s vSyisv) 

As to the prediction error E(k,l), square root 
25 processing may be performed after determination of a square 
error, or an absolute difference may used. The prediction 
error E(k,l) obtained by the simple motion estimation 
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processing is determined with respect to all of the blocks 
on the downscaled feature plane 1 , thereby obtaining the 
sum Esum on the downscaled feature plane 1 . The sum Esum 
is an index indicating the magnitude of a variation between 
5 two video pictures . The sum Esum as the image variance 
information 56 is output to the frame/field structure 
decision section 57. 

Next, the frame/field structure decision section 57 
illustrated in Fig. 11 judges whether or not the image 

10 variance information 56 per input frame exceeds a threshold 
value. If the image variance information 56 per input frame 
is the threshold value or more, the frame/field structure 
decision section 57 decides the field structure; to the 
contrary, if the image variance information 56 per input 

15 frame is less than the threshold value, the frame/field 
structure decision section 57 decides the frame structure. 
Thereafter, the frame/field structure decision section 57 
outputs the decision result as the picture structure 
information 58 to the video coding section 59. 

20 The video coding section 59 illustrated in Fig. 11 

performs the compression coding of the video signal to be 
input by the use of the picture structure designated by the 
picture structure information 58 output from the frame/field 
structure decision section 57, and then, outputs the coded 

25 data 16. Specifically, the video coding section 59 switches , 
for example, the operations of the motion compensator 10, 
the first variable length encoder 5 and the second variable 
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length encoder 14 illustrated in Fig. 1 to the system adaptive 
to coding in the frame/field structure according to the 
picture structure . 

Fig. 16 is a block diagram illustrating the 
5 configuration of a sixth embodiment according to the present 
invention. The same reference numerals as those in Fig. 
11 designate like or corresponding constituent elements. 
In the present embodiment , a video coding apparatus comprises 
a downscaled feature plane creation section 53 , a simple 

10 motion estimation section 55 and a frame/field structure 
decision section 57. The present embodiment is 
characterized in that the frame/field structure decision 
section 57 selects coding by a field structure if image 
variance information 56 obtained by simple motion estimation 

15 processing in the simplemotion estimation section 55 exceeds 
a certain threshold value; in the meantime , it selects coding 
by a frame structure if the image variance information 56 
is less than the threshold value. There is a difference 
between the fifth embodiment illustrated in Fig. 11 and the 

20 present embodiment in that in the former embodiment the 
downscaled feature plane creation section 53 creates the 
downscaled feature plane in the case of the interlaced video 
picture , while in the latter the downscaled feature plane 
creation section 53 creates the downscaled feature plane 

25 also in the case of a non-interlaced video picture. 

Fig. 17 is a block diagram illustrating the 
configuration of a seventh embodiment according to the 
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present invention. The same reference numerals as those 
in Fig. 11 designate like or corresponding constituent 
elements. In the present embodiment , a video coding 
apparatus comprises an interlaced/non-interlaced video 
5 discriminant section 51 and a frame/field structure decision 
section 57. The interlaced/non-interlaced video 

discriminant section 5 1 discriminates whether or not an input 
video picture is an interlaced video picture. The present 
embodiment is characterized in that the frame/field 
p 10 structure decision section 57 selects coding by a field 

yi structure in the case where the input video picture is an 

j r § interlaced video picture; to the contrary, it selects coding 

! 1! by a frame structure in the case where the input video picture 

' iM is a non-interlaced video picture. 

O 15 Fig. 18 is a block diagram illustrating the 

IIJ configuration of an eighth embodiment according to the 

Q present invention. The same reference numerals as those 

in Fig. 11 designate like or corresponding constituent 
elements . In the present embodiment , a video coding 
20 apparatus comprises an interlaced/non-interlaced video 
discriminant section 51, an interlaced/non-interlaced 
video switch section 60 , a downscaled feature plane creation 
section 53, a simple motion estimation section 55 and a 
frame/field structure decision section 57. The 
25 interlaced/non-interlaced video discriminant section 51 
discriminates whether one video picture input first or a 
plurality of video pictures are interlaced or non-interlaced 
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video pictures. Based on the discrimination, the 
interlaced/non-interlaced switch section 60 switches "0" 
and "1". As for video pictures input hereafter, the 
interlaced/non-interlaced video discriminant section 51 
does not perform discrimination of interlaced or 
non-interlaced video pictures. The present embodiment is 
different in the above-described point from the fifth 
embodiment illustrated in Fig. 11. 

Fig. 19 is a block diagram illustrating the 
configuration of a ninth embodiment according to the present 
invention. The same reference numerals as those in Fig. 
11 designate like or corresponding constituent elements. 
In the present embodiment, a video coding apparatus comprises 
an interlaced/non-interlaced video discriminant section 51 , 
an interlaced/non-interlaced video switch section 60 and 
a frame/field structure decision section 57. The 
interlaced/non-interlaced video discriminant section 51 
discriminates whether one video picture input first or a 
plurality of video pictures are interlaced or non-interlaced 
video pictures. Based on the discrimination, the 
interlaced/non-interlaced switch section 60 switches 0 
and "1". As for video pictures input hereafter, the 
interlaced/non-interlaced video discriminant section 51 
does not perform discrimination of interlaced or 
non-interlaced video pictures. The present embodiment is 
different in the above-described point from the seventh 
embodiment illustrated in Fig. 17. 




^ y is obvious from the above description, although the 
video picture having an improvable coding efficiency is 
limited iri the conventional coding by the fixed picture 
structure , since the coding is selected dependently on the 
picture structure according to the feature or variation of 
the input video picture according to the present invention , 
the high coding efficiency can be kept even if a video picture 
having any feature aSs input or the feature of the video picture 
is varied on the wa\ 
10 Furthermore , theVideo coding simulation is conducted 

by using the MPEG2 video coding system as the video coding 
system in which the motion compensatory prediction coding 
can be carried out by either the frame structure or the field 
structure. As a result, the quality of the video picture 
15 can be improved by about 0.4 dB to 1.0 dB of PSNR according 
to the present invention in comparison with the case of the 
fixation in the frame structure under the condition of the 
compression coding at a coding rate of 4 Mbit/s. 
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